The goal of the analyses in this notebook is to identify predictors of frog survival following translocation. The analyses are conducted in a Bayesian framework using the R package rstanarm.

Model 1: Site-level and frog-level predictors

Load packages

library(tidyverse)
library(brms)
library(rstanarm)
library(bayesplot)
library(tidybayes)
library(loo)
library(broom.mixed)
library(janitor)

source("classification_summary.R")

Create frog translocation dataset for analysis

  • Read in data, create variables for analysis, standardize continuous variables.
  • arm package used to rescale continuous variables (substract mean, divide by 2 SD), manual centering of binary variables.
d1 <- read_csv(here::here("data", "clean", "frog_translocation_final.csv")) %>% 
  drop_na(bd_load) %>%  # drop 12 records where bd_load is NA (9 are from 70413)
  mutate(
    first = if_else(order == 1, 1, 0),
    surv = if_else(survival < 0.5, 0, 1),
    male = if_else(sex == "m", 1, 0),
    elev_lo = if_else(elevation < 3020, 1, 0), # using interval defined by cut_interval, n = 2
    elev_z = arm::rescale(elevation),
    snowt_z = arm::rescale(snow_t),
    snowt1_z = arm::rescale(snow_t1),
    day_z = arm::rescale(day),
    length_z = arm::rescale(length),
    bdload_l = log10(bd_load + 1),
    bdload_z = arm::rescale(bdload_l),
    shore_c = shore - mean(shore),
    male_c = male - mean(male),
    elevlo_c = elev_lo - mean(elev_lo),
    first_c = first - mean(first),
    across(c(shore_c, male_c, elevlo_c, first_c), round, 3), 
    across(c(site_id, shore_c, first_c, donor, male_c, elevlo_c, year), as.factor),
    surv = as.integer(surv))
Rows: 779 Columns: 16── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr   (2): pit_tag_ref, sex
dbl  (13): site_id, elevation, shore, snow_t, snow_t1, year, day, order, donor, length, weight, bd_load, survival
date  (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Add a descriptive translocation_id
d1 <- d1 %>% 
  distinct(site_id, year) %>% 
  arrange(site_id, year) %>% 
  group_by(site_id) %>% 
  mutate(transno = seq_len(n())) %>% 
  ungroup(site_id) %>% 
  unite(translocation_id, c("site_id", "transno"), remove = FALSE) %>% 
  select(-transno) %>% 
  mutate(translocation_id = as.factor(translocation_id)) %>% 
  inner_join(d1, by = c("site_id", "year")) %>% 
  relocate(translocation_id, .after = site_id)

Null values - check

d1 %>% summarize(across(everything(), ~ sum(is.na(.))))

Model using non-standardized predictors

Model specification

m1a <- stan_glm(
  surv ~ length + snow_t + snow_t1 + day + bdload_l + elevation + first + male + donor,
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2,
  cores = 4,
  seed = 84735)
starting worker pid=30765 on localhost:11161 at 14:07:58.990
starting worker pid=30794 on localhost:11161 at 14:07:59.123
starting worker pid=30823 on localhost:11161 at 14:07:59.260
starting worker pid=30852 on localhost:11161 at 14:07:59.390

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 2.9e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.29 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 2.8e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.28 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 2.9e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.29 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 2.8e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.28 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 1.03731 seconds (Warm-up)
Chain 1:                1.09127 seconds (Sampling)
Chain 1:                2.12858 seconds (Total)
Chain 1: 
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 1.024 seconds (Warm-up)
Chain 2:                1.10167 seconds (Sampling)
Chain 2:                2.12567 seconds (Total)
Chain 2: 
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 1.02241 seconds (Warm-up)
Chain 3:                1.19702 seconds (Sampling)
Chain 3:                2.21943 seconds (Total)
Chain 3: 
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 1.0287 seconds (Warm-up)
Chain 4:                1.12602 seconds (Sampling)
Chain 4:                2.15472 seconds (Total)
Chain 4: 

Priors & chain diagnostics

prior_summary(m1a)
Priors for model 'm1a' 
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 2.5)

Coefficients
  Specified prior:
    ~ normal(location = [0,0,0,...], scale = [2.5,2.5,2.5,...])
  Adjusted prior:
    ~ normal(location = [0,0,0,...], scale = [0.526,0.049,0.073,...])
------
See help('prior_summary.stanreg') for more details
mcmc_trace(m1a, size = 0.1)

mcmc_dens_overlay(m1a)


summary(m1a)

Model Info:
 function:     stan_glm
 family:       binomial [logit]
 formula:      surv ~ length + snow_t + snow_t1 + day + bdload_l + elevation + 
       first + male + donor
 algorithm:    sampling
 sample:       20000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 767
 predictors:   11

Estimates:
              mean   sd    10%   50%   90%
(Intercept)  -8.5    2.3 -11.5  -8.5  -5.5
length        0.0    0.0   0.0   0.0   0.0
snow_t        0.0    0.0   0.0   0.0   0.0
snow_t1       0.0    0.0   0.0   0.0   0.0
day           0.0    0.0   0.0   0.0   0.0
bdload_l     -0.1    0.1  -0.1  -0.1   0.0
elevation     0.0    0.0   0.0   0.0   0.0
first         0.4    0.2   0.1   0.4   0.7
male          0.3    0.2   0.1   0.3   0.5
donor70567   -1.3    0.4  -1.9  -1.3  -0.8
donor72996   -2.2    0.4  -2.7  -2.2  -1.7

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.3    0.0  0.3   0.3   0.3  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.0  1.0  24916
length        0.0  1.0  25653
snow_t        0.0  1.0  22266
snow_t1       0.0  1.0  21140
day           0.0  1.0  22645
bdload_l      0.0  1.0  29642
elevation     0.0  1.0  24654
first         0.0  1.0  23563
male          0.0  1.0  28297
donor70567    0.0  1.0  20512
donor72996    0.0  1.0  18851
mean_PPD      0.0  1.0  23850
log-posterior 0.0  1.0   8254

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
  • mcmc diagnostics all look good.

List model variable names

get_variables(m1a)
 [1] "(Intercept)"   "length"        "snow_t"        "snow_t1"       "day"           "bdload_l"      "elevation"     "first"        
 [9] "male"          "donor70567"    "donor72996"    "accept_stat__" "stepsize__"    "treedepth__"   "n_leapfrog__"  "divergent__"  
[17] "energy__"     

Posterior predictive check

pp_check(m1a, plotfun = "stat") +
    xlab("Frog survival rate")

  • Posterior predictive distribution implied by regression model closely matches original data (based on mean frog survival).

Calculate leave-one-out cross-validation (loo)

loo_m1a <- loo(m1a, cores = 4, save_psis = TRUE)
print(loo_m1a)

Computed from 20000 by 767 log-likelihood matrix
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Posterior analysis

tidy(m1a, conf.int = TRUE, conf.level = 0.90)

mcmc_intervals(m1a, point_est = "median", prob = 0.80, prob_outer = 0.9) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  # add 0-line to accentuate default line
  ggtitle("Model with non-standardized predictors only",
          "Posterior distributions with medians & 90% uncertainty intervals")


mcmc_areas(m1a, area_method = "equal height", prob = 0.90) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with non-standardized predictors only",
          "Posterior distributions with medians & 90% uncertainty intervals")


mean(bayes_R2(m1a))
[1] 0.2728229
  • Using non-standardized predictors makes plot hard to read due to vastly different scales of predictors.
  • Based on 90% uncertainty intervals, important predictors are snow_t1, elevation, first, donor70567, and donor72996.

Model using standardized predictors

Model specification

m1b <- stan_glm(
  surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + first_c + male_c + donor,
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4, 
  seed = 84735)
starting worker pid=31099 on localhost:11161 at 14:08:27.041
starting worker pid=31128 on localhost:11161 at 14:08:27.178
starting worker pid=31157 on localhost:11161 at 14:08:27.312
starting worker pid=31186 on localhost:11161 at 14:08:27.437

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 5.1e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.51 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 3.6e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.36 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 2.9e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.29 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 2.9e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.29 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 1.03799 seconds (Warm-up)
Chain 1:                1.10559 seconds (Sampling)
Chain 1:                2.14358 seconds (Total)
Chain 1: 
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 1.05874 seconds (Warm-up)
Chain 2:                1.16202 seconds (Sampling)
Chain 2:                2.22076 seconds (Total)
Chain 2: 
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 1.05214 seconds (Warm-up)
Chain 3:                1.18965 seconds (Sampling)
Chain 3:                2.24179 seconds (Total)
Chain 3: 
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 1.05926 seconds (Warm-up)
Chain 4:                1.15084 seconds (Sampling)
Chain 4:                2.2101 seconds (Total)
Chain 4: 

Priors, chain diagnostics, posterior predictive check

prior_summary(m1b)
Priors for model 'm1b' 
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 2.5)

Coefficients
  Specified prior:
    ~ normal(location = [0,0,0,...], scale = [2.5,2.5,2.5,...])
  Adjusted prior:
    ~ normal(location = [0,0,0,...], scale = [5.00,5.00,5.00,...])
------
See help('prior_summary.stanreg') for more details
mcmc_trace(m1b, size = 0.1)

mcmc_dens_overlay(m1a)


summary(m1b)

Model Info:
 function:     stan_glm
 family:       binomial [logit]
 formula:      surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + 
       first_c + male_c + donor
 algorithm:    sampling
 sample:       20000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 767
 predictors:   11

Estimates:
               mean   sd   10%   50%   90%
(Intercept)   0.4    0.4  0.0   0.4   0.9 
length_z      0.0    0.2 -0.3   0.0   0.3 
snowt_z       0.0    0.2 -0.3   0.0   0.3 
snowt1_z     -1.3    0.2 -1.6  -1.3  -1.0 
day_z         0.2    0.2 -0.1   0.2   0.4 
bdload_z     -0.2    0.2 -0.4  -0.2   0.1 
elev_z        1.5    0.3  1.2   1.5   1.9 
first_c0.353  0.4    0.2  0.1   0.4   0.7 
male_c0.548   0.3    0.2  0.1   0.3   0.5 
donor70567   -1.3    0.4 -1.9  -1.3  -0.8 
donor72996   -2.2    0.4 -2.7  -2.2  -1.7 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.3    0.0  0.3   0.3   0.3  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.0  1.0  21959
length_z      0.0  1.0  25653
snowt_z       0.0  1.0  22266
snowt1_z      0.0  1.0  21140
day_z         0.0  1.0  22645
bdload_z      0.0  1.0  29642
elev_z        0.0  1.0  24654
first_c0.353  0.0  1.0  23563
male_c0.548   0.0  1.0  28297
donor70567    0.0  1.0  20512
donor72996    0.0  1.0  18851
mean_PPD      0.0  1.0  23850
log-posterior 0.0  1.0   8254

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
  • mcmc diagnostics all look good.

List model variable names

get_variables(m1b)
 [1] "(Intercept)"   "length_z"      "snowt_z"       "snowt1_z"      "day_z"         "bdload_z"      "elev_z"        "first_c0.353" 
 [9] "male_c0.548"   "donor70567"    "donor72996"    "accept_stat__" "stepsize__"    "treedepth__"   "n_leapfrog__"  "divergent__"  
[17] "energy__"     

Posterior predictive check

pp_check(m1b, plotfun = "stat") +
    xlab("Frog survival rate")

  • Posterior predictive distribution implied by regression model closely matches original data (based on mean frog survival).

Calculate loo

loo_m1b <- loo(m1b, cores = 4, save_psis = TRUE)
print(loo_m1b)

Computed from 20000 by 767 log-likelihood matrix
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Posterior analysis

tidy(m1b, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m1b, area_method = "equal height", prob = 0.90) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictors only",
          "Posterior distributions with medians & 90% uncertainty intervals")


mean(bayes_R2(m1b))
[1] 0.2728229
  • As expected, using standardized predictors allows easy comparison across predictors. Important predictors are unchanged.
  • Based on 90% uncertainty intervals, important predictors are snow_t1, elevation, first, donor70567, and donor72996.

Compare models

loo_compare(loo_m1a, loo_m1b)
    elpd_diff se_diff
m1a 0.0       0.0    
m1b 0.0       0.0    
  • As expected, models with non-standardized and standardized predictor variables have identical predictive accuracy.

Possible grouping structures to add to model

The dataset has three groups: site_id, translocation_id, and donor site. Donor site is not suitable as a grouping variable because there are only 3 levels (i.e., site ids), not enough to make this useful. Site_id and translocation_id contain 12 and 24 levels, respectively, making them suitable as grouping variables. Including both site_id and translocation_id as nested grouping variables (i.e., translocations nested within site ids) also seems justified. Having more levels within a group would seem to increase the power to detect effects of predictors. With fewer levels/less variability, effect may be accounted for primarily by grouping variable itself instead of predictor. With only 1-4 translocations per site and only 12 site ids total, it is possible that model with site_id as sole grouping variable and model with site_id and translocation_id as nested grouping variables will attribute more of the variation in frog survival to the grouping variables and less to the predictors. A priori, that would suggest that the model with translocation_id as the grouping variable might be the most useful model structure (assuming that all models have similar posterior predictive accuracy). Lacking any strong rationale for one model structure over another, built 3 models and compared their posterior predictive accuracy using loo.

Model using standardized predictors and site_id grouping variable

Model specification

m1c <- stan_glmer(
  surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + first_c + male_c + donor + (1 | site_id),
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
starting worker pid=31434 on localhost:11161 at 14:08:54.654
starting worker pid=31463 on localhost:11161 at 14:08:54.796
starting worker pid=31492 on localhost:11161 at 14:08:54.928
starting worker pid=31521 on localhost:11161 at 14:08:55.060

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Gradient evaluation took 0.000127 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.27 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0.000121 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.21 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0.000117 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.17 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: Rejecting initial value:
Chain 4:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 4:   Stan can't start sampling from this initial value.
Chain 4: 
Chain 4: Gradient evaluation took 0.000149 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.49 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 27.0167 seconds (Warm-up)
Chain 2:                26.1874 seconds (Sampling)
Chain 2:                53.2041 seconds (Total)
Chain 2: 
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 27.5774 seconds (Warm-up)
Chain 4:                26.2088 seconds (Sampling)
Chain 4:                53.7863 seconds (Total)
Chain 4: 
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 26.8041 seconds (Warm-up)
Chain 3:                27.6652 seconds (Sampling)
Chain 3:                54.4693 seconds (Total)
Chain 3: 
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 27.1918 seconds (Warm-up)
Chain 1:                30.4422 seconds (Sampling)
Chain 1:                57.634 seconds (Total)
Chain 1: 

Priors & chain diagnostics

prior_summary(m1c)
Priors for model 'm1c' 
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 2.5)

Coefficients
  Specified prior:
    ~ normal(location = [0,0,0,...], scale = [2.5,2.5,2.5,...])
  Adjusted prior:
    ~ normal(location = [0,0,0,...], scale = [5.00,5.00,5.00,...])

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
------
See help('prior_summary.stanreg') for more details
b1 <- c("(Intercept)", "length_z", "snowt_z", "snowt1_z", "day_z", "bdload_z", "elev_z", "first_c0.353", "male_c0.548", "donor70567", "donor72996")

mcmc_trace(m1c, size = 0.1, pars = b1)

mcmc_dens_overlay(m1c, pars = b1)


summary(m1c)

Model Info:
 function:     stan_glmer
 family:       binomial [logit]
 formula:      surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + 
       first_c + male_c + donor + (1 | site_id)
 algorithm:    sampling
 sample:       20000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 767
 groups:       site_id (12)

Estimates:
                                         mean   sd   10%   50%   90%
(Intercept)                             0.0    1.2 -1.5   0.0   1.3 
length_z                                0.0    0.2 -0.3   0.0   0.3 
snowt_z                                 0.0    0.4 -0.5   0.0   0.5 
snowt1_z                                0.0    0.5 -0.7  -0.1   0.7 
day_z                                   0.2    0.4 -0.4   0.2   0.7 
bdload_z                               -0.2    0.2 -0.5  -0.2   0.1 
elev_z                                  1.7    1.0  0.5   1.7   3.0 
first_c0.353                            0.3    0.2  0.0   0.3   0.7 
male_c0.548                             0.4    0.2  0.1   0.3   0.6 
donor70567                             -0.8    1.7 -2.8  -0.8   1.3 
donor72996                             -1.9    1.3 -3.4  -1.9  -0.3 
b[(Intercept) site_id:70134]           -0.4    0.6 -1.2  -0.4   0.3 
b[(Intercept) site_id:70370]           -1.7    1.0 -2.9  -1.6  -0.5 
b[(Intercept) site_id:70413]            0.0    1.3 -1.6   0.0   1.5 
b[(Intercept) site_id:70414]           -1.2    1.2 -2.7  -1.0   0.2 
b[(Intercept) site_id:70449]            1.5    0.6  0.8   1.5   2.3 
b[(Intercept) site_id:70505]            0.2    0.6 -0.5   0.2   1.0 
b[(Intercept) site_id:70550]            0.6    0.6 -0.2   0.5   1.4 
b[(Intercept) site_id:70556]           -0.9    1.0 -2.2  -0.9   0.4 
b[(Intercept) site_id:70619]           -0.6    0.7 -1.5  -0.5   0.3 
b[(Intercept) site_id:70628]            1.1    0.8  0.2   1.1   2.1 
b[(Intercept) site_id:70641]            0.0    0.8 -0.9   0.0   1.0 
b[(Intercept) site_id:74976]            1.0    1.0 -0.2   0.9   2.3 
Sigma[site_id:(Intercept),(Intercept)]  1.9    1.4  0.6   1.5   3.5 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.3    0.0  0.3   0.3   0.3  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
                                       mcse Rhat n_eff
(Intercept)                            0.0  1.0   9604
length_z                               0.0  1.0  29640
snowt_z                                0.0  1.0  12471
snowt1_z                               0.0  1.0   9663
day_z                                  0.0  1.0   9895
bdload_z                               0.0  1.0  29229
elev_z                                 0.0  1.0   8370
first_c0.353                           0.0  1.0  25887
male_c0.548                            0.0  1.0  30649
donor70567                             0.0  1.0  12665
donor72996                             0.0  1.0   9205
b[(Intercept) site_id:70134]           0.0  1.0  12243
b[(Intercept) site_id:70370]           0.0  1.0   9052
b[(Intercept) site_id:70413]           0.0  1.0  16783
b[(Intercept) site_id:70414]           0.0  1.0  13633
b[(Intercept) site_id:70449]           0.0  1.0   9662
b[(Intercept) site_id:70505]           0.0  1.0  10089
b[(Intercept) site_id:70550]           0.0  1.0   9125
b[(Intercept) site_id:70556]           0.0  1.0  13480
b[(Intercept) site_id:70619]           0.0  1.0   9114
b[(Intercept) site_id:70628]           0.0  1.0   9365
b[(Intercept) site_id:70641]           0.0  1.0   8782
b[(Intercept) site_id:74976]           0.0  1.0  10720
Sigma[site_id:(Intercept),(Intercept)] 0.0  1.0   7013
mean_PPD                               0.0  1.0  18973
log-posterior                          0.1  1.0   5668

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Posterior predictive check

pp_check(m1c, plotfun = "stat") +
    xlab("Frog survival rate")

Calculate loo

loo_m1c <- loo(m1c, cores = 4, save_psis = TRUE)
print(loo_m1c)

Computed from 20000 by 767 log-likelihood matrix
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Posterior analysis

tidy(m1c, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m1c, area_method = "equal height", prob = 0.90, pars = b1) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictors & site_id as grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")


mean(bayes_R2(m1c))
[1] 0.3169546
# Frog survival by site_id
mcmc_areas_ridges(m1c, regex_pars = "site_id") +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each recipient site_id")

  • Based on 90% uncertainty intervals, important predictors reduced to elevation and male.

Model using standardized predictors and translocation_id grouping variable)

Model specification

m1d <- stan_glmer(
  surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + first_c + male_c + donor + (1 | translocation_id),
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
starting worker pid=31860 on localhost:11161 at 14:10:21.896
starting worker pid=31889 on localhost:11161 at 14:10:22.019
starting worker pid=31918 on localhost:11161 at 14:10:22.143
starting worker pid=31947 on localhost:11161 at 14:10:22.262

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Gradient evaluation took 0.000118 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.18 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0.000106 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.06 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0.000122 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.22 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.000118 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.18 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 17.5819 seconds (Warm-up)
Chain 1:                16.3492 seconds (Sampling)
Chain 1:                33.9311 seconds (Total)
Chain 1: 
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 17.2776 seconds (Warm-up)
Chain 2:                17.9174 seconds (Sampling)
Chain 2:                35.195 seconds (Total)
Chain 2: 
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 17.6707 seconds (Warm-up)
Chain 3:                17.3522 seconds (Sampling)
Chain 3:                35.0229 seconds (Total)
Chain 3: 
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 17.8043 seconds (Warm-up)
Chain 4:                17.4289 seconds (Sampling)
Chain 4:                35.2332 seconds (Total)
Chain 4: 

Priors & chain diagnostics

prior_summary(m1d)
Priors for model 'm1d' 
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 2.5)

Coefficients
  Specified prior:
    ~ normal(location = [0,0,0,...], scale = [2.5,2.5,2.5,...])
  Adjusted prior:
    ~ normal(location = [0,0,0,...], scale = [5.00,5.00,5.00,...])

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
------
See help('prior_summary.stanreg') for more details
b1 <- c("(Intercept)", "length_z", "snowt_z", "snowt1_z", "day_z", "bdload_z", "elev_z", "first_c0.353", "male_c0.548", "donor70567", "donor72996")

mcmc_trace(m1d, size = 0.1, pars = b1)

mcmc_dens_overlay(m1d, pars = b1)


summary(m1d)

Model Info:
 function:     stan_glmer
 family:       binomial [logit]
 formula:      surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + 
       first_c + male_c + donor + (1 | translocation_id)
 algorithm:    sampling
 sample:       20000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 767
 groups:       translocation_id (24)

Estimates:
                                                  mean   sd   10%   50%   90%
(Intercept)                                      0.3    0.7 -0.6   0.3   1.2 
length_z                                         0.0    0.2 -0.3   0.0   0.3 
snowt_z                                          0.0    0.6 -0.7   0.0   0.7 
snowt1_z                                        -1.2    0.5 -1.9  -1.2  -0.5 
day_z                                            0.1    0.5 -0.6   0.1   0.8 
bdload_z                                        -0.2    0.2 -0.5  -0.2   0.0 
elev_z                                           1.6    0.6  0.8   1.6   2.4 
first_c0.353                                     0.5    0.5 -0.1   0.5   1.1 
male_c0.548                                      0.3    0.2  0.1   0.3   0.6 
donor70567                                      -1.3    0.9 -2.4  -1.3  -0.2 
donor72996                                      -2.2    0.8 -3.2  -2.2  -1.3 
b[(Intercept) translocation_id:70134_1]         -0.6    0.5 -1.3  -0.6   0.1 
b[(Intercept) translocation_id:70370_1]         -0.6    0.8 -1.6  -0.5   0.4 
b[(Intercept) translocation_id:70370_2]          0.0    0.7 -0.9   0.0   0.9 
b[(Intercept) translocation_id:70413_1]         -0.1    0.7 -1.0  -0.1   0.8 
b[(Intercept) translocation_id:70413_2]          0.4    0.7 -0.5   0.4   1.3 
b[(Intercept) translocation_id:70413_3]         -0.3    0.8 -1.3  -0.3   0.6 
b[(Intercept) translocation_id:70414_1]         -0.9    0.8 -2.0  -0.9   0.1 
b[(Intercept) translocation_id:70449_1]          0.1    0.6 -0.7   0.1   0.9 
b[(Intercept) translocation_id:70449_2]          1.6    0.6  0.9   1.6   2.4 
b[(Intercept) translocation_id:70505_1]          0.1    0.5 -0.6   0.1   0.8 
b[(Intercept) translocation_id:70505_2]          0.1    0.6 -0.7   0.1   0.9 
b[(Intercept) translocation_id:70505_3]          0.2    0.7 -0.7   0.2   1.0 
b[(Intercept) translocation_id:70505_4]         -0.1    0.6 -0.9  -0.1   0.7 
b[(Intercept) translocation_id:70550_1]          0.5    0.5 -0.2   0.5   1.2 
b[(Intercept) translocation_id:70550_2]         -0.1    0.6 -0.9  -0.1   0.6 
b[(Intercept) translocation_id:70556_1]         -0.5    0.7 -1.4  -0.5   0.4 
b[(Intercept) translocation_id:70556_2]         -0.8    0.7 -1.8  -0.8   0.0 
b[(Intercept) translocation_id:70619_1]         -0.5    0.6 -1.2  -0.5   0.2 
b[(Intercept) translocation_id:70628_1]          0.8    0.6  0.0   0.8   1.6 
b[(Intercept) translocation_id:70641_1]          0.2    0.7 -0.7   0.2   1.1 
b[(Intercept) translocation_id:70641_2]         -0.3    0.7 -1.1  -0.2   0.6 
b[(Intercept) translocation_id:70641_3]         -0.7    0.7 -1.7  -0.7   0.2 
b[(Intercept) translocation_id:74976_1]          1.4    0.8  0.4   1.3   2.4 
b[(Intercept) translocation_id:74976_2]          0.0    0.7 -0.8   0.0   0.9 
Sigma[translocation_id:(Intercept),(Intercept)]  0.9    0.5  0.4   0.8   1.6 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.3    0.0  0.3   0.3   0.3  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
                                                mcse Rhat n_eff
(Intercept)                                     0.0  1.0  10818
length_z                                        0.0  1.0  25683
snowt_z                                         0.0  1.0  10328
snowt1_z                                        0.0  1.0  12115
day_z                                           0.0  1.0  10672
bdload_z                                        0.0  1.0  28564
elev_z                                          0.0  1.0  11059
first_c0.353                                    0.0  1.0  11342
male_c0.548                                     0.0  1.0  26534
donor70567                                      0.0  1.0  10846
donor72996                                      0.0  1.0  10753
b[(Intercept) translocation_id:70134_1]         0.0  1.0  14706
b[(Intercept) translocation_id:70370_1]         0.0  1.0  15110
b[(Intercept) translocation_id:70370_2]         0.0  1.0  17141
b[(Intercept) translocation_id:70413_1]         0.0  1.0  13738
b[(Intercept) translocation_id:70413_2]         0.0  1.0  14123
b[(Intercept) translocation_id:70413_3]         0.0  1.0  16112
b[(Intercept) translocation_id:70414_1]         0.0  1.0  17969
b[(Intercept) translocation_id:70449_1]         0.0  1.0  12532
b[(Intercept) translocation_id:70449_2]         0.0  1.0  13621
b[(Intercept) translocation_id:70505_1]         0.0  1.0  14823
b[(Intercept) translocation_id:70505_2]         0.0  1.0  18847
b[(Intercept) translocation_id:70505_3]         0.0  1.0  20672
b[(Intercept) translocation_id:70505_4]         0.0  1.0  17878
b[(Intercept) translocation_id:70550_1]         0.0  1.0  11544
b[(Intercept) translocation_id:70550_2]         0.0  1.0  15234
b[(Intercept) translocation_id:70556_1]         0.0  1.0  14110
b[(Intercept) translocation_id:70556_2]         0.0  1.0  14298
b[(Intercept) translocation_id:70619_1]         0.0  1.0  12428
b[(Intercept) translocation_id:70628_1]         0.0  1.0  11195
b[(Intercept) translocation_id:70641_1]         0.0  1.0  11049
b[(Intercept) translocation_id:70641_2]         0.0  1.0  18120
b[(Intercept) translocation_id:70641_3]         0.0  1.0  16196
b[(Intercept) translocation_id:74976_1]         0.0  1.0  14344
b[(Intercept) translocation_id:74976_2]         0.0  1.0  14465
Sigma[translocation_id:(Intercept),(Intercept)] 0.0  1.0   8036
mean_PPD                                        0.0  1.0  19612
log-posterior                                   0.1  1.0   5727

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Posterior predictive check

pp_check(m1d, plotfun = "stat") +
    xlab("Frog survival rate")

Calculate loo

loo_m1d <- loo(m1d, cores = 4, save_psis = TRUE)
print(loo_m1d)

Computed from 20000 by 767 log-likelihood matrix
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Posterior analysis

tidy(m1d, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m1d, area_method = "equal height", prob = 0.90, pars = b1) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictors & translocation_id as grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")


mean(bayes_R2(m1d))
[1] 0.322301
# Frog survival by translocation_id
mcmc_areas_ridges(m1d, regex_pars = "translocation_id") +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each translocation_id")

  • Based on loo, model with translocation_id as grouping variable provides slightly better fit than model with site_id as grouping variable.
  • Based on 90% uncertainty intervals, important predictors are similar to those in model without grouping variable: snowt1_z, elev_z, male_c0.548, donor72996.

Model using standardized predictors and 2 grouping variables: translocation_id nested within site_id)

Model specification

m1e <- stan_glmer(
  surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + first_c + male_c + donor + (1 | translocation_id) + (1 | site_id),
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
starting worker pid=32198 on localhost:11161 at 14:11:31.853
starting worker pid=32227 on localhost:11161 at 14:11:31.973
starting worker pid=32256 on localhost:11161 at 14:11:32.092
starting worker pid=32285 on localhost:11161 at 14:11:32.217

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000152 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.52 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0.000119 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.19 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0.000149 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.49 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.000135 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.35 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 33.5493 seconds (Warm-up)
Chain 3:                29.5497 seconds (Sampling)
Chain 3:                63.099 seconds (Total)
Chain 3: 
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 34.0917 seconds (Warm-up)
Chain 2:                29.7833 seconds (Sampling)
Chain 2:                63.875 seconds (Total)
Chain 2: 
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 32.337 seconds (Warm-up)
Chain 1:                43.9325 seconds (Sampling)
Chain 1:                76.2694 seconds (Total)
Chain 1: 
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 33.8116 seconds (Warm-up)
Chain 4:                42.6213 seconds (Sampling)
Chain 4:                76.4328 seconds (Total)
Chain 4: 

Priors & chain diagnostics

prior_summary(m1e)
Priors for model 'm1e' 
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 2.5)

Coefficients
  Specified prior:
    ~ normal(location = [0,0,0,...], scale = [2.5,2.5,2.5,...])
  Adjusted prior:
    ~ normal(location = [0,0,0,...], scale = [5.00,5.00,5.00,...])

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
------
See help('prior_summary.stanreg') for more details
b1 <- c("(Intercept)", "length_z", "snowt_z", "snowt1_z", "day_z", "bdload_z", "elev_z", "first_c0.353", "male_c0.548", "donor70567", "donor72996")

mcmc_trace(m1e, size = 0.1, pars = b1)

mcmc_dens_overlay(m1e, pars = b1)


summary(m1e)

Model Info:
 function:     stan_glmer
 family:       binomial [logit]
 formula:      surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + 
       first_c + male_c + donor + (1 | translocation_id) + (1 | 
       site_id)
 algorithm:    sampling
 sample:       20000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 767
 groups:       translocation_id (24), site_id (12)

Estimates:
                                                  mean   sd   10%   50%   90%
(Intercept)                                      0.1    1.0 -1.2   0.1   1.3 
length_z                                         0.0    0.2 -0.3   0.0   0.3 
snowt_z                                         -0.1    0.5 -0.7  -0.1   0.6 
snowt1_z                                        -0.5    0.7 -1.5  -0.6   0.4 
day_z                                            0.2    0.5 -0.5   0.2   0.9 
bdload_z                                        -0.2    0.2 -0.5  -0.2   0.0 
elev_z                                           1.7    0.9  0.6   1.7   2.9 
first_c0.353                                     0.4    0.4  0.0   0.4   0.9 
male_c0.548                                      0.3    0.2  0.1   0.3   0.6 
donor70567                                      -1.0    1.5 -2.7  -1.1   0.8 
donor72996                                      -2.0    1.1 -3.4  -2.1  -0.7 
b[(Intercept) translocation_id:70134_1]         -0.2    0.5 -0.9  -0.1   0.4 
b[(Intercept) translocation_id:70370_1]         -0.4    0.6 -1.1  -0.3   0.2 
b[(Intercept) translocation_id:70370_2]          0.1    0.5 -0.5   0.1   0.7 
b[(Intercept) translocation_id:70413_1]          0.0    0.5 -0.6   0.0   0.6 
b[(Intercept) translocation_id:70413_2]          0.2    0.5 -0.4   0.1   0.9 
b[(Intercept) translocation_id:70413_3]         -0.2    0.6 -0.9  -0.2   0.4 
b[(Intercept) translocation_id:70414_1]         -0.3    0.7 -1.2  -0.2   0.3 
b[(Intercept) translocation_id:70449_1]         -0.1    0.5 -0.7  -0.1   0.5 
b[(Intercept) translocation_id:70449_2]          0.7    0.7  0.0   0.6   1.7 
b[(Intercept) translocation_id:70505_1]          0.0    0.5 -0.5   0.0   0.6 
b[(Intercept) translocation_id:70505_2]          0.1    0.5 -0.5   0.1   0.7 
b[(Intercept) translocation_id:70505_3]          0.1    0.5 -0.5   0.0   0.7 
b[(Intercept) translocation_id:70505_4]         -0.1    0.5 -0.7  -0.1   0.5 
b[(Intercept) translocation_id:70550_1]          0.3    0.5 -0.3   0.2   0.9 
b[(Intercept) translocation_id:70550_2]         -0.1    0.5 -0.7  -0.1   0.4 
b[(Intercept) translocation_id:70556_1]         -0.2    0.6 -0.9  -0.1   0.4 
b[(Intercept) translocation_id:70556_2]         -0.2    0.6 -1.0  -0.1   0.4 
b[(Intercept) translocation_id:70619_1]         -0.2    0.5 -0.9  -0.1   0.4 
b[(Intercept) translocation_id:70628_1]          0.3    0.6 -0.3   0.2   1.1 
b[(Intercept) translocation_id:70641_1]          0.2    0.5 -0.4   0.1   0.9 
b[(Intercept) translocation_id:70641_2]         -0.1    0.5 -0.7   0.0   0.5 
b[(Intercept) translocation_id:70641_3]         -0.3    0.6 -1.1  -0.2   0.2 
b[(Intercept) translocation_id:74976_1]          0.5    0.7 -0.1   0.4   1.5 
b[(Intercept) translocation_id:74976_2]         -0.1    0.5 -0.7  -0.1   0.5 
b[(Intercept) site_id:70134]                    -0.3    0.6 -1.1  -0.3   0.4 
b[(Intercept) site_id:70370]                    -0.9    1.0 -2.3  -0.7   0.2 
b[(Intercept) site_id:70413]                     0.0    1.1 -1.3   0.0   1.2 
b[(Intercept) site_id:70414]                    -0.7    1.1 -2.1  -0.5   0.3 
b[(Intercept) site_id:70449]                     1.0    0.8  0.0   1.0   2.0 
b[(Intercept) site_id:70505]                     0.2    0.6 -0.5   0.1   0.9 
b[(Intercept) site_id:70550]                     0.3    0.6 -0.4   0.2   1.1 
b[(Intercept) site_id:70556]                    -0.7    0.9 -1.8  -0.5   0.3 
b[(Intercept) site_id:70619]                    -0.4    0.7 -1.3  -0.3   0.4 
b[(Intercept) site_id:70628]                     0.6    0.8 -0.2   0.5   1.7 
b[(Intercept) site_id:70641]                    -0.1    0.7 -1.0  -0.1   0.7 
b[(Intercept) site_id:74976]                     0.7    0.9 -0.2   0.6   1.9 
Sigma[translocation_id:(Intercept),(Intercept)]  0.4    0.4  0.0   0.3   0.9 
Sigma[site_id:(Intercept),(Intercept)]           1.2    1.3  0.1   0.8   2.6 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.3    0.0  0.3   0.3   0.3  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
                                                mcse Rhat n_eff
(Intercept)                                     0.0  1.0   8760
length_z                                        0.0  1.0  26927
snowt_z                                         0.0  1.0  12055
snowt1_z                                        0.0  1.0   5542
day_z                                           0.0  1.0  10670
bdload_z                                        0.0  1.0  28589
elev_z                                          0.0  1.0  10334
first_c0.353                                    0.0  1.0  12870
male_c0.548                                     0.0  1.0  27066
donor70567                                      0.0  1.0   9522
donor72996                                      0.0  1.0   8564
b[(Intercept) translocation_id:70134_1]         0.0  1.0  10178
b[(Intercept) translocation_id:70370_1]         0.0  1.0  13784
b[(Intercept) translocation_id:70370_2]         0.0  1.0  21315
b[(Intercept) translocation_id:70413_1]         0.0  1.0  18159
b[(Intercept) translocation_id:70413_2]         0.0  1.0  16405
b[(Intercept) translocation_id:70413_3]         0.0  1.0  18489
b[(Intercept) translocation_id:70414_1]         0.0  1.0   8310
b[(Intercept) translocation_id:70449_1]         0.0  1.0  15651
b[(Intercept) translocation_id:70449_2]         0.0  1.0   4203
b[(Intercept) translocation_id:70505_1]         0.0  1.0  18690
b[(Intercept) translocation_id:70505_2]         0.0  1.0  23587
b[(Intercept) translocation_id:70505_3]         0.0  1.0  23491
b[(Intercept) translocation_id:70505_4]         0.0  1.0  22403
b[(Intercept) translocation_id:70550_1]         0.0  1.0  12418
b[(Intercept) translocation_id:70550_2]         0.0  1.0  19758
b[(Intercept) translocation_id:70556_1]         0.0  1.0  14624
b[(Intercept) translocation_id:70556_2]         0.0  1.0   7455
b[(Intercept) translocation_id:70619_1]         0.0  1.0  11194
b[(Intercept) translocation_id:70628_1]         0.0  1.0   8502
b[(Intercept) translocation_id:70641_1]         0.0  1.0  15452
b[(Intercept) translocation_id:70641_2]         0.0  1.0  20501
b[(Intercept) translocation_id:70641_3]         0.0  1.0   9364
b[(Intercept) translocation_id:74976_1]         0.0  1.0   5537
b[(Intercept) translocation_id:74976_2]         0.0  1.0  20547
b[(Intercept) site_id:70134]                    0.0  1.0  12998
b[(Intercept) site_id:70370]                    0.0  1.0   5422
b[(Intercept) site_id:70413]                    0.0  1.0  14150
b[(Intercept) site_id:70414]                    0.0  1.0   9926
b[(Intercept) site_id:70449]                    0.0  1.0   4422
b[(Intercept) site_id:70505]                    0.0  1.0  10980
b[(Intercept) site_id:70550]                    0.0  1.0   9459
b[(Intercept) site_id:70556]                    0.0  1.0  10122
b[(Intercept) site_id:70619]                    0.0  1.0   9424
b[(Intercept) site_id:70628]                    0.0  1.0   6708
b[(Intercept) site_id:70641]                    0.0  1.0   9372
b[(Intercept) site_id:74976]                    0.0  1.0   7086
Sigma[translocation_id:(Intercept),(Intercept)] 0.0  1.0   4082
Sigma[site_id:(Intercept),(Intercept)]          0.0  1.0   4867
mean_PPD                                        0.0  1.0  19556
log-posterior                                   0.1  1.0   5917

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Posterior predictive check

pp_check(m1e, plotfun = "stat") +
    xlab("Frog survival rate")

Calculate loo

loo_m1e <- loo(m1e, cores = 4, save_psis = TRUE)
print(loo_m1e)

Computed from 20000 by 767 log-likelihood matrix
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Posterior analysis

tidy(m1e, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m1e, area_method = "equal height", prob = 0.90, pars = b1) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictors & translocation_id as grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")


mean(bayes_R2(m1e))
[1] 0.3232688
# Frog survival by translocation_id
mcmc_areas_ridges(m1e, regex_pars = ("translocation_id")) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each translocation_id")


# Frog survival by site_id
mcmc_areas_ridges(m1e, regex_pars = ("site_id")) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each translocation_id")

Compare 4 models with different grouping structures

loo_compare(loo_m1b, loo_m1c, loo_m1d, loo_m1e)
    elpd_diff se_diff
m1c   0.0       0.0  
m1e  -0.5       1.9  
m1d  -1.1       3.1  
m1b -18.2       6.3  
  • Based on expected log-predictive densities (ELPD), 3 models with grouping variables all have substantially higher posterior predictive accuracy than model without any grouping variables (elpd_diff of m1b is more than 2 SE lower than elpd_diff of m1c: elpd_diff = -18.2 \(\pm\) 12.6, 2 SE interval does not overlap 0: -5.6 – -30.8).
  • 3 models with 1 or more grouping variables are equivalent and have similar posterior predictive accuracy (elpd_diff of m1e and m1d are within 2 SE of elpd_diff of m1c).
  • m1d seems to provide more insight into the effect of the predictors, so chose this model for additional analysis.

Additional analysis of model m1d

Posterior classification accuracy

# Posterior predictive models for each frog in dataset
set.seed(84735)
survival_predict_1 <- posterior_predict(m1d, newdata = d1)

# Classify frog survival based on mean predicted survival probability
survival_predict_2 <- d1 %>% 
  mutate(survival_prob = colMeans(survival_predict_1),
         survival_class = as.numeric(survival_prob >= 0.5)) %>% 
  select(site_id, translocation_id, year, snowt1_z, elev_z, male_c, donor, surv, survival_prob, survival_class)

# Generate confusion matrix and calculate accuracy
survival_predict_2 %>% 
  tabyl(surv, survival_class) %>% 
  adorn_totals(c("row", "col"))
  surv   0   1 Total
     0 471  62   533
     1  96 138   234
 Total 567 200   767
(471 + 138)/767 # overall_accuracy 
[1] 0.7940026
471/533 # true negative rate
[1] 0.8836773
138/234 # true positive rate
[1] 0.5897436
# Generate confusion matrix using function
set.seed(84735)
classification_summary(m1d, data = d1, cutoff = 0.5)
$confusion_matrix
 y   0   1
 0 471  62
 1  96 138

$accuracy_rates
NA
  • Sensitivity or “true positive rate” measures the proportion of Y = 1 observations that are accurately classified.
  • Specificity or “true negative rate” measures the proportion of Y = 0 observations that are accurately classified (i.e., proportion of frogs correctly identified as not surviving).

Create reduced model containing only important predictors for subsequent analyses

  • Reduced model is helpful because it reduces the number of values used to calculate predictive models.
m1d_reduce <- stan_glmer(
  surv ~ snowt1_z + elev_z + male_c + donor + (1 | translocation_id),
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
starting worker pid=32560 on localhost:11161 at 14:13:29.133
starting worker pid=32589 on localhost:11161 at 14:13:29.270
starting worker pid=32618 on localhost:11161 at 14:13:29.389
starting worker pid=32647 on localhost:11161 at 14:13:29.508

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 9.7e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.97 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0.000112 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.12 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0.000107 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.07 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.000105 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.05 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 12.7264 seconds (Warm-up)
Chain 1:                12.4558 seconds (Sampling)
Chain 1:                25.1822 seconds (Total)
Chain 1: 
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 12.9418 seconds (Warm-up)
Chain 2:                12.0243 seconds (Sampling)
Chain 2:                24.9661 seconds (Total)
Chain 2: 
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 12.7903 seconds (Warm-up)
Chain 3:                12.5559 seconds (Sampling)
Chain 3:                25.3463 seconds (Total)
Chain 3: 
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 13.0066 seconds (Warm-up)
Chain 4:                12.7664 seconds (Sampling)
Chain 4:                25.773 seconds (Total)
Chain 4: 

Confirm fit of reduced model is comparable to full model

loo_m1d_reduce <- loo(m1d, cores = 4, save_psis = TRUE)
print(loo_m1d)

Computed from 20000 by 767 log-likelihood matrix
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.
loo_compare(loo_m1d, loo_m1d_reduce)
    elpd_diff se_diff
m1d 0.0       0.0    
m1d 0.0       0.0    

Simulate and plot posterior predictive models & observations

m1d_group_means <- d1 %>% 
  group_by(translocation_id) %>% 
  summarize(count = n(), psurv = mean(surv), elev_z = mean(elev_z), snowt1_z = mean(snowt1_z)) %>% 
  mutate(
    male_c = as.factor(0.548), 
    donor = case_when(
      str_detect(translocation_id, "74976") | str_detect(translocation_id, "70556") ~ "70459",
      str_detect(translocation_id, "70413") ~ "70567",
      TRUE ~ "72996")) %>% 
  arrange(psurv)  # to arrange plot values/labels by psurv

predictions_m1d <- posterior_predict(m1d_reduce, newdata = m1d_group_means)

ppc_intervals(
  y = m1d_group_means$psurv,
  yrep = predictions_m1d, 
  prob = 0.5, 
  prob_outer = 0.8) +
ggplot2::scale_x_continuous(
  labels = m1d_group_means$translocation_id,
  breaks = 1:nrow(m1d_group_means)) +
  xaxis_text(angle = 90, vjust = 0.5) +
xlab("Translocation_id")

  • Uncertainty intervals appear incorrect for binomial model, but not sure how to adjust.
  • Based on mean values, model does an adequate job of predicting survival based on group-level and individual-level characteristics.
  • What explains the low predicted survival at 70370 (both translocations), and high versus low predicted survival at 70550 for 1st and 2nd translocations, respectively? Both sites seem to contain high quality habitat (i.e., high elevation).
  • The predicted survival is based on assumption that only males are included. Run another set of models for females?

Plot posterior plausible relationships between each predictor and the probability of frog survival (conditional effects)

  • As described in https://rdrr.io/cran/brms/man/conditional_effects.brmsfit.html, “When creating conditional_effects for a particular predictor (or interaction of two predictors), one has to choose the values of all other predictors to condition on. By default, the mean is used for continuous variables and the reference category is used for factors, but you may change these values via argument conditions. This also has an implication for the points argument: In the created plots, only those points will be shown that correspond to the factor levels actually used in the conditioning, in order not to create the false impression of bad model fit, where it is just due to conditioning on certain factor levels.”
  • Using rstanarm, I was unable to do this as I expected to: Start with data = d1 and using “newdata =” to specify that predictions should be based on data in d1_elevz (for example), in which non-plotted continuous predictors are set to the mean value. Not sure if my alternative approach is correct.
  • Approach used: Instead of starting the routine with d1, I started with the new dataset in which all variables except the one of interest were set to mean values (continuous variables) and the reference level (categorical variables). This produced plots/results that were very similar to those produced in brms using the conditional_effects function.

Effect of elevation

d1_elevz <- d1 %>% 
  mutate(snowt1_z = mean(snowt1_z), 
         male_c = recode(male_c, "-0.452" = "0.548"),
         donor = recode(donor, "70567" = "70459", "72996" = "70459"))

d1_elevz %>% distinct(snowt1_z, male_c, donor) # ensure variables were set to correct values
  
d1_elevz %>% 
  select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
  add_epred_draws(m1d_reduce, ndraws = 100, re_formula = NA) %>% 
  ggplot(aes(x = elev_z, y = surv)) +
    geom_line(aes(y = .epred, group = .draw), color = "blue", size = 0.2, alpha = 0.5) +
#    geom_point(aes(y = .epred)) +
  labs(x = "Elevation of recipient site (standardized)", y = "Probability of 1-year frog survival")



# d1 %>%  # Test of add_predicted_draws used for similar purpose - unsuccessful
#   select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
#   add_predicted_draws(m1d_reduce, ndraws = 100, re_formula = NA, newdata = d1_elevz) %>% View()
#   ggplot(aes(x = elev_z, y = surv)) +
#     geom_line(aes(y = .epred, group = .draw), color = "blue", size = 0.2, alpha = 0.5) +
#     geom_point(aes(y = .epred)) +
#   labs(x = "Elevation of recipient site (standardized)", y = "Probability of 1-year frog survival")

Effect of winter severity in year t1

d1_snowt1z <- d1 %>% 
  mutate(elev_z = mean(elev_z), 
         male_c = recode(male_c, "-0.452" = "0.548"),
         donor = recode(donor, "70567" = "70459", "72996" = "70459"))

d1_snowt1z %>% distinct(elev_z, male_c, donor) # ensure variables were set to correct values
  
d1_snowt1z %>% 
  select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
  add_epred_draws(m1d_reduce, ndraws = 100, re_formula = NA) %>% 
  ggplot(aes(x = snowt1_z, y = surv)) +
    geom_line(aes(y = .epred, group = .draw), color = "blue", size = 0.2, alpha = 0.5) +
#    geom_point(aes(y = .epred)) +
  labs(x = "Winter severity in year following translocation (standardized)", y = "Probability of 1-year frog survival") 

Effect of sex

d1_malec <- d1 %>% 
  mutate(elev_z = mean(elev_z), 
         snowt1_z = mean(snowt1_z),
         donor = recode(donor, "70567" = "70459", "72996" = "70459"))

d1_malec %>% distinct(elev_z, snowt1_z, donor) # ensure variables were set to correct values
  
d1_malec %>% 
  select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
  add_epred_draws(m1d_reduce, ndraws = 100, re_formula = NA) %>% 
  ggplot(aes(x = male_c, y = surv)) +
    geom_boxplot(aes(y = .epred, x = male_c)) +
#    geom_point(aes(y = .epred)) +
    labs(x = "Sex (centered)", y = "Probability of 1-year frog survival")

Effect of donor

d1_donor <- d1 %>% 
  mutate(elev_z = mean(elev_z), 
         snowt1_z = mean(snowt1_z),
         male_c = recode(male_c, "-0.452" = "0.548"))

d1_donor %>% distinct(elev_z, snowt1_z, male_c) # ensure variables were set to correct values
  
d1_donor %>% 
  select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
  add_epred_draws(m1d_reduce, ndraws = 100, re_formula = NA) %>% 
  ggplot(aes(x = donor, y = surv)) +
    geom_boxplot(aes(y = .epred, x = donor)) +
#    geom_point(aes(y = .epred)) +
    labs(x = "Donor population", y = "Probability of 1-year frog survival")

Joint effects of elevation and sex

d1_elevz_malec <- d1 %>% 
  mutate(snowt1_z = mean(snowt1_z), 
         donor = recode(donor, "70567" = "70459", "72996" = "70459"))

d1_elevz_malec %>% distinct(snowt1_z, donor) # ensure variables were set to correct values
  
pal <- c("#0d0887", "#f89540")
d1_elevz_malec %>% 
  select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
  add_epred_draws(m1d_reduce, ndraws = 100, re_formula = NA) %>% 
  ggplot(aes(x = elev_z, y = surv, color = male_c)) +
    geom_line(aes(y = .epred, group = paste(male_c, .draw)), size = 0.4, alpha = 0.5) +
#    geom_point(aes(y = .epred)) +
    labs(x = "Elevation of recipient site (standardized)", y = "Probability of 1-year frog survival") +
    scale_color_manual(values = pal)  + 
    theme_classic()

For comparison, fit model using brms and conditional_effects function to plot variable-specific effects

Fit model

m1d_reduce_brms <- brm(
  surv ~ snowt1_z + elev_z + male_c + donor + (1 | translocation_id),
  data = d1,
  family = bernoulli(),
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
Compiling Stan program...
Start sampling
starting worker pid=33171 on localhost:11161 at 14:14:44.229
starting worker pid=33200 on localhost:11161 at 14:14:44.346
starting worker pid=33229 on localhost:11161 at 14:14:44.465
starting worker pid=33258 on localhost:11161 at 14:14:44.587

SAMPLING FOR MODEL 'dbb8efe82731517ccb01114eea2991f6' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 4.4e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.44 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)

SAMPLING FOR MODEL 'dbb8efe82731517ccb01114eea2991f6' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 4.5e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.45 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)

SAMPLING FOR MODEL 'dbb8efe82731517ccb01114eea2991f6' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 5.2e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.52 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'dbb8efe82731517ccb01114eea2991f6' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 5e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.5 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 3.07122 seconds (Warm-up)
Chain 1:                3.88175 seconds (Sampling)
Chain 1:                6.95297 seconds (Total)
Chain 1: 
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 3.04835 seconds (Warm-up)
Chain 3:                3.02406 seconds (Sampling)
Chain 3:                6.07241 seconds (Total)
Chain 3: 
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 3.08857 seconds (Warm-up)
Chain 2:                4.3183 seconds (Sampling)
Chain 2:                7.40688 seconds (Total)
Chain 2: 
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 2.98321 seconds (Warm-up)
Chain 4:                4.03943 seconds (Sampling)
Chain 4:                7.02264 seconds (Total)
Chain 4: 

Plot conditional effects

plot(conditional_effects(m1d_reduce_brms, spaghetti = TRUE, mean = TRUE, ndraws = 100), ask = FALSE)

Model 2: Survival of frogs from first translocation as predictor of survival in subsequent translocations

Load packages

library(tidyverse)
library(brms)
library(rstanarm)
library(bayesplot)
library(tidybayes)
library(loo)
library(broom.mixed)
library(janitor)

source("classification_summary.R")

Create frog translocation dataset for analysis

  • Read in data, create variables for analysis, standardize continuous variables.
  • arm package used to rescale continuous variables (substract mean, divide by 2 SD), manual centering of binary variables.
d2 <- read_csv(here::here("data", "clean", "frog_translocation_final.csv")) %>% 
  select(site_id, year, pit_tag_ref, survival) %>% 
  mutate(
    surv = if_else(survival < 0.5, 0, 1), 
    across(c(site_id, year), as.factor),
    surv = as.integer(surv))
Rows: 779 Columns: 16── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr   (2): pit_tag_ref, sex
dbl  (13): site_id, elevation, shore, snow_t, snow_t1, year, day, order, donor, length, weight, bd_load, survival
date  (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Add a descriptive translocation_id
d2 <- d2 %>% 
  distinct(site_id, year) %>% 
  arrange(site_id, year) %>% 
  group_by(site_id) %>% 
  mutate(transno = seq_len(n())) %>% 
  ungroup(site_id) %>% 
  unite(translocation_id, c("site_id", "transno"), remove = FALSE) %>% 
  mutate(translocation_id = as.factor(translocation_id)) %>% 
  inner_join(d2, by = c("site_id", "year")) %>% 
  relocate(translocation_id, .after = site_id) %>% 
  relocate(transno, .after = translocation_id)

Restructure dataset

# Remove sites that received only 1 translocation
d2 <- d2 %>% 
  distinct(site_id, transno) %>% 
  group_by(site_id) %>% 
  mutate(trans_max = max(transno)) %>% 
  ungroup(site_id) %>% 
  select(-transno) %>% 
  filter(trans_max > 1) %>% 
  distinct(site_id, trans_max) %>% 
  inner_join(d2, by = c("site_id"))

# Create predictor from survival of first translocated cohort
d2 <- d2 %>% 
  filter(transno == 1) %>% 
  group_by(site_id) %>% 
  mutate(survival_trans1 = mean(survival)) %>%    # use mean or median?
  ungroup(site_id) %>% 
  distinct(site_id, survival_trans1) %>% 
  mutate(across(c(survival_trans1), round, 3)) %>% 
  inner_join(d2, by = c("site_id")) %>% 
  filter(transno != 1) %>% 
  relocate(translocation_id, .after = site_id) %>% 
  mutate(survival_trans1_z = arm::rescale(survival_trans1), .after = survival_trans1) %>% 
  select(-transno, -trans_max) 

Model with non-standardized predictor, no grouping variable

Model specification

m2a <- stan_glm(
  surv ~ survival_trans1,
  data = d2,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
starting worker pid=26759 on localhost:11161 at 11:58:42.463
starting worker pid=26788 on localhost:11161 at 11:58:42.582
starting worker pid=26817 on localhost:11161 at 11:58:42.703
starting worker pid=26846 on localhost:11161 at 11:58:42.840

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 2e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 1.7e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.164285 seconds (Warm-up)
Chain 1:                0.19407 seconds (Sampling)
Chain 1:                0.358355 seconds (Total)
Chain 1: 
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 1.8e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.168292 seconds (Warm-up)
Chain 2:                0.213936 seconds (Sampling)
Chain 2:                0.382228 seconds (Total)
Chain 2: 
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.16298 seconds (Warm-up)
Chain 3:                0.195404 seconds (Sampling)
Chain 3:                0.358384 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 1.7e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.157914 seconds (Warm-up)
Chain 4:                0.202545 seconds (Sampling)
Chain 4:                0.360459 seconds (Total)
Chain 4: 

List model variable names

get_variables(m2a)
[1] "(Intercept)"     "survival_trans1" "accept_stat__"   "stepsize__"      "treedepth__"     "n_leapfrog__"    "divergent__"    
[8] "energy__"       

Priors & chain diagnostics

prior_summary(m2a)
Priors for model 'm2a' 
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 2.5)

Coefficients
  Specified prior:
    ~ normal(location = 0, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 0, scale = 9)
------
See help('prior_summary.stanreg') for more details
b2 <- c("(Intercept)", "survival_trans1")

mcmc_trace(m2a, size = 0.1, pars = b2)

mcmc_dens_overlay(m2a, pars = b2)


summary(m2a)

Model Info:
 function:     stan_glm
 family:       binomial [logit]
 formula:      surv ~ survival_trans1
 algorithm:    sampling
 sample:       20000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 272
 predictors:   2

Estimates:
                  mean   sd   10%   50%   90%
(Intercept)     -2.6    0.3 -3.0  -2.6  -2.2 
survival_trans1  4.1    0.6  3.4   4.1   4.9 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.3    0.0  0.3   0.3   0.3  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
                mcse Rhat n_eff
(Intercept)     0.0  1.0   9446
survival_trans1 0.0  1.0  10258
mean_PPD        0.0  1.0  14754
log-posterior   0.0  1.0   8226

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Posterior predictive check

pp_check(m2a, plotfun = "stat") +
    xlab("Frog survival rate")

Calculate loo

loo_m2a <- loo(m2a, cores = 4, save_psis = TRUE)
print(loo_m2a)

Computed from 20000 by 272 log-likelihood matrix
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Posterior analysis

tidy(m2a, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m2a, area_method = "equal height", prob = 0.90, pars = b2) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with unstandardized predictor & no grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")


mean(bayes_R2(m2a))
[1] 0.2454747

Model with standardized predictor, no grouping variable

Model specification

m2b <- stan_glm(
  surv ~ survival_trans1_z,
  data = d2,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
starting worker pid=26392 on localhost:11161 at 11:54:41.741
starting worker pid=26421 on localhost:11161 at 11:54:41.858
starting worker pid=26450 on localhost:11161 at 11:54:41.977
starting worker pid=26479 on localhost:11161 at 11:54:42.097

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 1.8e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 1.5e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.172818 seconds (Warm-up)
Chain 1:                0.210771 seconds (Sampling)
Chain 1:                0.383589 seconds (Total)
Chain 1: 
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.166616 seconds (Warm-up)
Chain 2:                0.216116 seconds (Sampling)
Chain 2:                0.382732 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 2.2e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.22 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 1.3e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.158169 seconds (Warm-up)
Chain 4:                0.205066 seconds (Sampling)
Chain 4:                0.363235 seconds (Total)
Chain 4: 
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.172232 seconds (Warm-up)
Chain 3:                0.203077 seconds (Sampling)
Chain 3:                0.375309 seconds (Total)
Chain 3: 

List model variable names

get_variables(m2b)
[1] "(Intercept)"       "survival_trans1_z" "accept_stat__"     "stepsize__"        "treedepth__"       "n_leapfrog__"      "divergent__"      
[8] "energy__"         

Priors & chain diagnostics

prior_summary(m2b)
Priors for model 'm2b' 
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 2.5)

Coefficients
  Specified prior:
    ~ normal(location = 0, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 0, scale = 5)
------
See help('prior_summary.stanreg') for more details
b3 <- c("(Intercept)", "survival_trans1_z")

mcmc_trace(m2b, size = 0.1, pars = b3)

mcmc_dens_overlay(m2b, pars = b3)


summary(m2b)

Model Info:
 function:     stan_glm
 family:       binomial [logit]
 formula:      surv ~ survival_trans1_z
 algorithm:    sampling
 sample:       20000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 272
 predictors:   2

Estimates:
                    mean   sd   10%   50%   90%
(Intercept)       -1.1    0.2 -1.3  -1.1  -0.9 
survival_trans1_z  2.3    0.3  1.9   2.3   2.7 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.3    0.0  0.3   0.3   0.3  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
                  mcse Rhat n_eff
(Intercept)       0.0  1.0  10462
survival_trans1_z 0.0  1.0  10258
mean_PPD          0.0  1.0  14754
log-posterior     0.0  1.0   8226

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Posterior predictive check

pp_check(m2b, plotfun = "stat") +
    xlab("Frog survival rate")

Calculate loo

loo_m2b <- loo(m2b, cores = 4, save_psis = TRUE)
print(loo_m2b)

Computed from 20000 by 272 log-likelihood matrix
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Posterior analysis

tidy(m2b, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m2b, area_method = "equal height", prob = 0.90, pars = b3) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with unstandardized predictors & no grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")


mean(bayes_R2(m2b))
[1] 0.2454747

Model with standardized predictor, translocation_id grouping variable

Model specification

m2c <- stan_glmer(
  surv ~ survival_trans1_z + (1 | translocation_id),
  data = d2,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
starting worker pid=28232 on localhost:11161 at 12:16:09.414
starting worker pid=28261 on localhost:11161 at 12:16:09.553
starting worker pid=28290 on localhost:11161 at 12:16:09.679
starting worker pid=28319 on localhost:11161 at 12:16:09.807

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 4.6e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.46 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 4e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.4 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 4.8e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.48 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 5.4e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.54 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 2.46557 seconds (Warm-up)
Chain 3:                2.02194 seconds (Sampling)
Chain 3:                4.48751 seconds (Total)
Chain 3: 
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 2.44814 seconds (Warm-up)
Chain 1:                2.63434 seconds (Sampling)
Chain 1:                5.08248 seconds (Total)
Chain 1: 
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 2.56256 seconds (Warm-up)
Chain 4:                2.27234 seconds (Sampling)
Chain 4:                4.8349 seconds (Total)
Chain 4: 
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 2.68597 seconds (Warm-up)
Chain 2:                2.80953 seconds (Sampling)
Chain 2:                5.49551 seconds (Total)
Chain 2: 

Priors & chain diagnostics

prior_summary(m2c)
Priors for model 'm2c' 
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 2.5)

Coefficients
  Specified prior:
    ~ normal(location = 0, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 0, scale = 5)

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
------
See help('prior_summary.stanreg') for more details
b3 <- c("(Intercept)", "survival_trans1_z")

mcmc_trace(m2c, size = 0.1, pars = b3)

mcmc_dens_overlay(m2c, pars = b3)


summary(m2c)

Model Info:
 function:     stan_glmer
 family:       binomial [logit]
 formula:      surv ~ survival_trans1_z + (1 | translocation_id)
 algorithm:    sampling
 sample:       20000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 272
 groups:       translocation_id (12)

Estimates:
                                                  mean   sd   10%   50%   90%
(Intercept)                                     -1.2    0.2 -1.5  -1.1  -0.9 
survival_trans1_z                                2.4    0.5  1.8   2.4   3.0 
b[(Intercept) translocation_id:70370_2]          0.1    0.5 -0.4   0.1   0.7 
b[(Intercept) translocation_id:70413_2]          0.1    0.4 -0.4   0.1   0.6 
b[(Intercept) translocation_id:70413_3]         -0.1    0.4 -0.6  -0.1   0.4 
b[(Intercept) translocation_id:70449_2]          0.8    0.5  0.2   0.7   1.4 
b[(Intercept) translocation_id:70505_2]          0.1    0.4 -0.4   0.1   0.7 
b[(Intercept) translocation_id:70505_3]         -0.1    0.5 -0.7  -0.1   0.5 
b[(Intercept) translocation_id:70505_4]         -0.1    0.4 -0.6  -0.1   0.4 
b[(Intercept) translocation_id:70550_2]         -0.4    0.4 -1.0  -0.4   0.1 
b[(Intercept) translocation_id:70556_2]          0.3    0.4 -0.2   0.2   0.8 
b[(Intercept) translocation_id:70641_2]         -0.2    0.5 -0.8  -0.1   0.3 
b[(Intercept) translocation_id:70641_3]         -0.4    0.5 -1.0  -0.3   0.2 
b[(Intercept) translocation_id:74976_2]         -0.2    0.5 -0.7  -0.1   0.4 
Sigma[translocation_id:(Intercept),(Intercept)]  0.4    0.3  0.0   0.3   0.8 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.3    0.0  0.3   0.3   0.3  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
                                                mcse Rhat n_eff
(Intercept)                                     0.0  1.0   9656
survival_trans1_z                               0.0  1.0  10091
b[(Intercept) translocation_id:70370_2]         0.0  1.0  16892
b[(Intercept) translocation_id:70413_2]         0.0  1.0  15098
b[(Intercept) translocation_id:70413_3]         0.0  1.0  16009
b[(Intercept) translocation_id:70449_2]         0.0  1.0   8361
b[(Intercept) translocation_id:70505_2]         0.0  1.0  16196
b[(Intercept) translocation_id:70505_3]         0.0  1.0  19069
b[(Intercept) translocation_id:70505_4]         0.0  1.0  17365
b[(Intercept) translocation_id:70550_2]         0.0  1.0  14590
b[(Intercept) translocation_id:70556_2]         0.0  1.0  13817
b[(Intercept) translocation_id:70641_2]         0.0  1.0  17808
b[(Intercept) translocation_id:70641_3]         0.0  1.0  15354
b[(Intercept) translocation_id:74976_2]         0.0  1.0  13143
Sigma[translocation_id:(Intercept),(Intercept)] 0.0  1.0   8590
mean_PPD                                        0.0  1.0  23552
log-posterior                                   0.1  1.0   4997

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Posterior predictive check

pp_check(m2c, plotfun = "stat") +
    xlab("Frog survival rate")

Calculate loo

loo_m2c <- loo(m2c, cores = 4, save_psis = TRUE)
print(loo_m2c)

Computed from 20000 by 272 log-likelihood matrix
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Posterior analysis

tidy(m2c, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m2c, area_method = "equal height", prob = 0.90, pars = b3) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictor & translocation_id as grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")


mean(bayes_R2(m2c))
[1] 0.2747476
# Frog survival by translocation_id
mcmc_areas_ridges(m2c, regex_pars = "translocation_id") +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each translocation_id")

Model with standardized predictor, site_id grouping variable

Model specification

m2d <- stan_glmer(
  surv ~ survival_trans1_z + (1 | site_id),
  data = d2,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
starting worker pid=28710 on localhost:11161 at 12:19:43.469
starting worker pid=28739 on localhost:11161 at 12:19:43.598
starting worker pid=28768 on localhost:11161 at 12:19:43.722
starting worker pid=28797 on localhost:11161 at 12:19:43.850

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 4.8e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.48 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 4e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.4 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 5.7e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.57 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 5.5e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.55 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 2.68846 seconds (Warm-up)
Chain 1:                2.35955 seconds (Sampling)
Chain 1:                5.04801 seconds (Total)
Chain 1: 
Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 2.68223 seconds (Warm-up)
Chain 2:                2.37802 seconds (Sampling)
Chain 2:                5.06025 seconds (Total)
Chain 2: 
Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 3.04027 seconds (Warm-up)
Chain 3:                3.03708 seconds (Sampling)
Chain 3:                6.07735 seconds (Total)
Chain 3: 
Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 2.64664 seconds (Warm-up)
Chain 4:                3.50693 seconds (Sampling)
Chain 4:                6.15357 seconds (Total)
Chain 4: 

Priors & chain diagnostics

prior_summary(m2d)
Priors for model 'm2d' 
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 2.5)

Coefficients
  Specified prior:
    ~ normal(location = 0, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 0, scale = 5)

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
------
See help('prior_summary.stanreg') for more details
b3 <- c("(Intercept)", "survival_trans1_z")

mcmc_trace(m2d, size = 0.1, pars = b3)

mcmc_dens_overlay(m2d, pars = b3)


summary(m2d)

Model Info:
 function:     stan_glmer
 family:       binomial [logit]
 formula:      surv ~ survival_trans1_z + (1 | site_id)
 algorithm:    sampling
 sample:       20000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 272
 groups:       site_id (8)

Estimates:
                                         mean   sd   10%   50%   90%
(Intercept)                            -1.1    0.3 -1.5  -1.1  -0.7 
survival_trans1_z                       2.3    0.6  1.6   2.3   3.0 
b[(Intercept) site_id:70370]            0.1    0.6 -0.6   0.1   0.8 
b[(Intercept) site_id:70413]            0.0    0.4 -0.5   0.0   0.5 
b[(Intercept) site_id:70449]            0.8    0.5  0.2   0.8   1.5 
b[(Intercept) site_id:70505]           -0.1    0.5 -0.7  -0.1   0.4 
b[(Intercept) site_id:70550]           -0.5    0.5 -1.2  -0.5   0.0 
b[(Intercept) site_id:70556]            0.3    0.5 -0.3   0.3   1.0 
b[(Intercept) site_id:70641]           -0.5    0.6 -1.3  -0.5   0.1 
b[(Intercept) site_id:74976]           -0.2    0.6 -0.9  -0.1   0.5 
Sigma[site_id:(Intercept),(Intercept)]  0.6    0.6  0.1   0.4   1.2 

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 0.3    0.0  0.3   0.3   0.3  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
                                       mcse Rhat n_eff
(Intercept)                            0.0  1.0   7228
survival_trans1_z                      0.0  1.0   7721
b[(Intercept) site_id:70370]           0.0  1.0  12554
b[(Intercept) site_id:70413]           0.0  1.0   9300
b[(Intercept) site_id:70449]           0.0  1.0   7916
b[(Intercept) site_id:70505]           0.0  1.0   8909
b[(Intercept) site_id:70550]           0.0  1.0  10698
b[(Intercept) site_id:70556]           0.0  1.0  10257
b[(Intercept) site_id:70641]           0.0  1.0   8917
b[(Intercept) site_id:74976]           0.0  1.0   9439
Sigma[site_id:(Intercept),(Intercept)] 0.0  1.0   6799
mean_PPD                               0.0  1.0  21901
log-posterior                          0.0  1.0   4346

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Posterior predictive check

pp_check(m2d, plotfun = "stat") +
    xlab("Frog survival rate")

Calculate loo

loo_m2d <- loo(m2d, cores = 4, save_psis = TRUE)
print(loo_m2d)

Computed from 20000 by 272 log-likelihood matrix
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Posterior analysis

tidy(m2d, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m2d, area_method = "equal height", prob = 0.90, pars = b3) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictor & site_id as grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")


mean(bayes_R2(m2d))
[1] 0.2791572
# Frog survival by site_id
mcmc_areas_ridges(m2d, regex_pars = "site_id") +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each site_id")

Compare 3 models with different grouping structures

loo_compare(loo_m2b, loo_m2c, loo_m2d)
    elpd_diff se_diff
m2d  0.0       0.0   
m2c -1.4       0.7   
m2b -3.1       2.9   
  • Based on expected log-predictive densities (ELPD), models with and without grouping variables have similar posterior predictive accuracy (elpd_diff of m2b is less than 2 SE lower than elpd_diff of m2d: elpd_diff = -3.1 \(\pm\) 5.8, 2 SE interval overlaps 0: 2.7 – -8.9).
  • Conclusion from all 3 models is the same: Average frog survival of first translocation is an important predictor of survival of individual frogs in subsequent translocations.

Posterior classification accuracy of best models

# Generate confusion matrix for m2b
set.seed(84735)
classification_summary(m2b, data = d2, cutoff = 0.5)
$confusion_matrix
 y   0  1
 0 163 28
 1  33 48

$accuracy_rates
# Generate confusion matrix for m2d
set.seed(84735)
classification_summary(m2d, data = d2, cutoff = 0.5)
$confusion_matrix
 y   0  1
 0 163 28
 1  33 48

$accuracy_rates
NA
  • As expected from loo results, models with and without grouping variable have the same posterior predictive accuracy.

TO DO

  • Validate priors, tune as necessary. Add final priors to model specification (in case the defaults change).
  • Check approach to plotting conditional effects.
  • Try un-standardizing predictor values in conditional-effects plots (and other plots?) to produce more intuitive visualizations.
---
title: "Predictors of frog survival following translocations - analyses"
author: "Roland Knapp"
output: html_notebook
---

The goal of the analyses in this notebook is to identify predictors of frog survival following translocation. The analyses are conducted in a Bayesian framework using the R package rstanarm.

# Model 1: Site-level and frog-level predictors

## Load packages

```{r load-packages}
library(tidyverse)
library(brms)
library(rstanarm)
library(bayesplot)
library(tidybayes)
library(loo)
library(broom.mixed)
library(janitor)

source("classification_summary.R")
```

## Create frog translocation dataset for analysis  
* Read in data, create variables for analysis, standardize continuous variables. 
* arm package used to rescale continuous variables (substract mean, divide by 2 SD), manual centering of binary variables.
```{r read-translocation-data}
d1 <- read_csv(here::here("data", "clean", "frog_translocation_final.csv")) %>% 
  drop_na(bd_load) %>%  # drop 12 records where bd_load is NA (9 are from 70413)
  mutate(
    first = if_else(order == 1, 1, 0),
    surv = if_else(survival < 0.5, 0, 1),
    male = if_else(sex == "m", 1, 0),
    elev_lo = if_else(elevation < 3020, 1, 0), # using interval defined by cut_interval, n = 2
    elev_z = arm::rescale(elevation),
    snowt_z = arm::rescale(snow_t),
    snowt1_z = arm::rescale(snow_t1),
    day_z = arm::rescale(day),
    length_z = arm::rescale(length),
    bdload_l = log10(bd_load + 1),
    bdload_z = arm::rescale(bdload_l),
    shore_c = shore - mean(shore),
    male_c = male - mean(male),
    elevlo_c = elev_lo - mean(elev_lo),
    first_c = first - mean(first),
    across(c(shore_c, male_c, elevlo_c, first_c), round, 3), 
    across(c(site_id, shore_c, first_c, donor, male_c, elevlo_c, year), as.factor),
    surv = as.integer(surv))

# Add a descriptive translocation_id
d1 <- d1 %>% 
  distinct(site_id, year) %>% 
  arrange(site_id, year) %>% 
  group_by(site_id) %>% 
  mutate(transno = seq_len(n())) %>% 
  ungroup(site_id) %>% 
  unite(translocation_id, c("site_id", "transno"), remove = FALSE) %>% 
  select(-transno) %>% 
  mutate(translocation_id = as.factor(translocation_id)) %>% 
  inner_join(d1, by = c("site_id", "year")) %>% 
  relocate(translocation_id, .after = site_id)
```

## Null values - check

```{r check-null}
d1 %>% summarize(across(everything(), ~ sum(is.na(.))))
```

## Model using non-standardized predictors
### Model specification
```{r}
m1a <- stan_glm(
  surv ~ length + snow_t + snow_t1 + day + bdload_l + elevation + first + male + donor,
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2,
  cores = 4,
  seed = 84735)
```

### Priors & chain diagnostics
```{r}
prior_summary(m1a)

mcmc_trace(m1a, size = 0.1)
mcmc_dens_overlay(m1a)

summary(m1a)
```
- mcmc diagnostics all look good.

### List model variable names
```{r}
get_variables(m1a)
```

### Posterior predictive check
```{r}
pp_check(m1a, plotfun = "stat") +
    xlab("Frog survival rate")
```
- Posterior predictive distribution implied by regression model closely matches original data (based on mean frog survival).

### Calculate leave-one-out cross-validation (loo)
```{r}
loo_m1a <- loo(m1a, cores = 4, save_psis = TRUE)
print(loo_m1a)
```

### Posterior analysis
```{r}
tidy(m1a, conf.int = TRUE, conf.level = 0.90)

mcmc_intervals(m1a, point_est = "median", prob = 0.80, prob_outer = 0.9) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  # add 0-line to accentuate default line
  ggtitle("Model with non-standardized predictors only",
          "Posterior distributions with medians & 90% uncertainty intervals")

mcmc_areas(m1a, area_method = "equal height", prob = 0.90) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with non-standardized predictors only",
          "Posterior distributions with medians & 90% uncertainty intervals")

mean(bayes_R2(m1a))
```
- Using non-standardized predictors makes plot hard to read due to vastly different scales of predictors.  
- Based on 90% uncertainty intervals, important predictors are snow_t1, elevation, first, donor70567, and donor72996.

## Model using standardized predictors
### Model specification
```{r}
m1b <- stan_glm(
  surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + first_c + male_c + donor,
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4, 
  seed = 84735)
```

### Priors, chain diagnostics, posterior predictive check
```{r}
prior_summary(m1b)

mcmc_trace(m1b, size = 0.1)
mcmc_dens_overlay(m1a)

summary(m1b)
```
- mcmc diagnostics all look good.

### List model variable names
```{r}
get_variables(m1b)
```

### Posterior predictive check
```{r}
pp_check(m1b, plotfun = "stat") +
    xlab("Frog survival rate")
```
- Posterior predictive distribution implied by regression model closely matches original data (based on mean frog survival).

### Calculate loo
```{r}
loo_m1b <- loo(m1b, cores = 4, save_psis = TRUE)
print(loo_m1b)
```

### Posterior analysis
```{r}
tidy(m1b, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m1b, area_method = "equal height", prob = 0.90) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictors only",
          "Posterior distributions with medians & 90% uncertainty intervals")

mean(bayes_R2(m1b))
```
- As expected, using standardized predictors allows easy comparison across predictors. Important predictors are unchanged. 
- Based on 90% uncertainty intervals, important predictors are snow_t1, elevation, first, donor70567, and donor72996.

## Compare models
```{r}
loo_compare(loo_m1a, loo_m1b)
```
- As expected, models with non-standardized and standardized predictor variables have identical predictive accuracy.  

## Possible grouping structures to add to model
The dataset has three groups: site_id, translocation_id, and donor site. Donor site is not suitable as a grouping variable because there are only 3 levels (i.e., site ids), not enough to make this useful. Site_id and translocation_id contain 12 and 24 levels, respectively, making them suitable as grouping variables. Including both site_id and translocation_id as nested grouping variables (i.e., translocations nested within site ids) also seems justified. Having more levels within a group would seem to increase the power to detect effects of predictors. With fewer levels/less variability, effect may be accounted for primarily by grouping variable itself instead of predictor. With only 1-4 translocations per site and only 12 site ids total, it is possible that model with site_id as sole grouping variable and model with site_id and translocation_id as nested grouping variables will attribute more of the variation in frog survival to the grouping variables and less to the predictors. A priori, that would suggest that the model with translocation_id as the grouping variable might be the most useful model structure (assuming that all models have similar posterior predictive accuracy). Lacking any strong rationale for  one model structure over another, built 3 models and compared their posterior predictive accuracy using loo. 

## Model using standardized predictors and site_id grouping variable
### Model specification
```{r}
m1c <- stan_glmer(
  surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + first_c + male_c + donor + (1 | site_id),
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
```

### Priors & chain diagnostics
```{r}
prior_summary(m1c)

b1 <- c("(Intercept)", "length_z", "snowt_z", "snowt1_z", "day_z", "bdload_z", "elev_z", "first_c0.353", "male_c0.548", "donor70567", "donor72996")

mcmc_trace(m1c, size = 0.1, pars = b1)
mcmc_dens_overlay(m1c, pars = b1)

summary(m1c)
```

### Posterior predictive check
```{r}
pp_check(m1c, plotfun = "stat") +
    xlab("Frog survival rate")
```

### Calculate loo
```{r}
loo_m1c <- loo(m1c, cores = 4, save_psis = TRUE)
print(loo_m1c)
```

### Posterior analysis
```{r}
tidy(m1c, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m1c, area_method = "equal height", prob = 0.90, pars = b1) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictors & site_id as grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")

mean(bayes_R2(m1c))

# Frog survival by site_id
mcmc_areas_ridges(m1c, regex_pars = "site_id") +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each recipient site_id")
```
- Based on 90% uncertainty intervals, important predictors reduced to elevation and male. 

## Model using standardized predictors and translocation_id grouping variable)
### Model specification
```{r}
m1d <- stan_glmer(
  surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + first_c + male_c + donor + (1 | translocation_id),
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
```

### Priors & chain diagnostics
```{r}
prior_summary(m1d)

b1 <- c("(Intercept)", "length_z", "snowt_z", "snowt1_z", "day_z", "bdload_z", "elev_z", "first_c0.353", "male_c0.548", "donor70567", "donor72996")

mcmc_trace(m1d, size = 0.1, pars = b1)
mcmc_dens_overlay(m1d, pars = b1)

summary(m1d)
```

### Posterior predictive check
```{r}
pp_check(m1d, plotfun = "stat") +
    xlab("Frog survival rate")
```

### Calculate loo
```{r}
loo_m1d <- loo(m1d, cores = 4, save_psis = TRUE)
print(loo_m1d)
```

### Posterior analysis
```{r}
tidy(m1d, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m1d, area_method = "equal height", prob = 0.90, pars = b1) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictors & translocation_id as grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")

mean(bayes_R2(m1d))

# Frog survival by translocation_id
mcmc_areas_ridges(m1d, regex_pars = "translocation_id") +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each translocation_id")
```
- Based on loo, model with translocation_id as grouping variable provides slightly better fit than model with site_id as grouping variable. 
- Based on 90% uncertainty intervals, important predictors are similar to those in model without grouping variable: snowt1_z, elev_z, male_c0.548, donor72996. 

## Model using standardized predictors and 2 grouping variables: translocation_id nested within site_id)
### Model specification
```{r}
m1e <- stan_glmer(
  surv ~ length_z + snowt_z + snowt1_z + day_z + bdload_z + elev_z + first_c + male_c + donor + (1 | translocation_id) + (1 | site_id),
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
```

### Priors & chain diagnostics
```{r}
prior_summary(m1e)

b1 <- c("(Intercept)", "length_z", "snowt_z", "snowt1_z", "day_z", "bdload_z", "elev_z", "first_c0.353", "male_c0.548", "donor70567", "donor72996")

mcmc_trace(m1e, size = 0.1, pars = b1)
mcmc_dens_overlay(m1e, pars = b1)

summary(m1e)
```

### Posterior predictive check
```{r}
pp_check(m1e, plotfun = "stat") +
    xlab("Frog survival rate")
```

### Calculate loo
```{r}
loo_m1e <- loo(m1e, cores = 4, save_psis = TRUE)
print(loo_m1e)
```

### Posterior analysis
```{r}
tidy(m1e, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m1e, area_method = "equal height", prob = 0.90, pars = b1) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictors & translocation_id as grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")

mean(bayes_R2(m1e))

# Frog survival by translocation_id
mcmc_areas_ridges(m1e, regex_pars = ("translocation_id")) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each translocation_id")

# Frog survival by site_id
mcmc_areas_ridges(m1e, regex_pars = ("site_id")) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each translocation_id")
```

## Compare 4 models with different grouping structures
```{r}
loo_compare(loo_m1b, loo_m1c, loo_m1d, loo_m1e)
```
- Based on expected log-predictive densities (ELPD), 3 models with grouping variables all have substantially higher posterior predictive accuracy than model without any grouping variables (elpd_diff of m1b is more than 2 SE lower than elpd_diff of m1c: elpd_diff = -18.2 $\pm$ 12.6, 2 SE interval does not overlap 0: -5.6 -- -30.8). 
- 3 models with 1 or more grouping variables are equivalent and have similar posterior predictive accuracy (elpd_diff of m1e and m1d are within 2 SE of elpd_diff of m1c).  
- m1d seems to provide more insight into the effect of the predictors, so chose this model for additional analysis. 

## Additional analysis of model m1d
### Posterior classification accuracy
```{r}
# Posterior predictive models for each frog in dataset
set.seed(84735)
survival_predict_1 <- posterior_predict(m1d, newdata = d1)

# Classify frog survival based on mean predicted survival probability
survival_predict_2 <- d1 %>% 
  mutate(survival_prob = colMeans(survival_predict_1),
         survival_class = as.numeric(survival_prob >= 0.5)) %>% 
  select(site_id, translocation_id, year, snowt1_z, elev_z, male_c, donor, surv, survival_prob, survival_class)

# Generate confusion matrix and calculate accuracy
survival_predict_2 %>% 
  tabyl(surv, survival_class) %>% 
  adorn_totals(c("row", "col"))

(471 + 138)/767 # overall_accuracy 
471/533 # true negative rate (specificity)
138/234 # true positive rate (sensitivity)

# Generate confusion matrix using function
set.seed(84735)
classification_summary(m1d, data = d1, cutoff = 0.5)
```
- Sensitivity or "true positive rate" measures the proportion of Y = 1 observations that are accurately classified.  
- Specificity or "true negative rate" measures the proportion of Y = 0 observations that are accurately classified (i.e., proportion of frogs correctly identified as not surviving).  

### Create reduced model containing only important predictors for subsequent analyses
- Reduced model is helpful because it reduces the number of values used to calculate predictive models. 
```{r}
m1d_reduce <- stan_glmer(
  surv ~ snowt1_z + elev_z + male_c + donor + (1 | translocation_id),
  data = d1,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
```

### Confirm fit of reduced model is comparable to full model
```{r}
loo_m1d_reduce <- loo(m1d, cores = 4, save_psis = TRUE)
print(loo_m1d)

loo_compare(loo_m1d, loo_m1d_reduce)
```

### Simulate and plot posterior predictive models & observations
```{r}
m1d_group_means <- d1 %>% 
  group_by(translocation_id) %>% 
  summarize(count = n(), psurv = mean(surv), elev_z = mean(elev_z), snowt1_z = mean(snowt1_z)) %>% 
  mutate(
    male_c = as.factor(0.548), 
    donor = case_when(
      str_detect(translocation_id, "74976") | str_detect(translocation_id, "70556") ~ "70459",
      str_detect(translocation_id, "70413") ~ "70567",
      TRUE ~ "72996")) %>% 
  arrange(psurv)  # to arrange plot values/labels by psurv

predictions_m1d <- posterior_predict(m1d_reduce, newdata = m1d_group_means)

ppc_intervals(
  y = m1d_group_means$psurv,
  yrep = predictions_m1d, 
  prob = 0.5, 
  prob_outer = 0.8) +
ggplot2::scale_x_continuous(
  labels = m1d_group_means$translocation_id,
  breaks = 1:nrow(m1d_group_means)) +
  xaxis_text(angle = 90, vjust = 0.5) +
xlab("Translocation_id")
```
- Uncertainty intervals appear incorrect for binomial model, but not sure how to adjust.  
- Based on mean values, model does an adequate job of predicting survival based on group-level and individual-level characteristics.  
- What explains the low predicted survival at 70370 (both translocations), and high versus low predicted survival at 70550 for 1st and 2nd translocations, respectively? Both sites seem to contain high quality habitat (i.e., high elevation).  
- The predicted survival is based on assumption that only males are included. Run another set of models for females?  

### Plot posterior plausible relationships between each predictor and the probability of frog survival (conditional effects)
- As described in https://rdrr.io/cran/brms/man/conditional_effects.brmsfit.html, "When creating conditional_effects for a particular predictor (or interaction of two predictors), one has to choose the values of all other predictors to condition on. By default, the mean is used for continuous variables and the reference category is used for factors, but you may change these values via argument conditions. This also has an implication for the points argument: In the created plots, only those points will be shown that correspond to the factor levels actually used in the conditioning, in order not to create the false impression of bad model fit, where it is just due to conditioning on certain factor levels."  
- Using rstanarm, I was unable to do this as I expected to: Start with data = d1 and using "newdata = " to specify that predictions should be based on data in d1_elevz (for example), in which non-plotted continuous predictors are set to the mean value. Not sure if my alternative approach is correct.  
- Approach used: Instead of starting the routine with d1, I started with the new dataset in which all variables except the one of interest were set to mean values (continuous variables) and the reference level (categorical variables). This produced plots/results that were very similar to those produced in brms using the conditional_effects function. 

#### Effect of elevation
```{r}
d1_elevz <- d1 %>% 
  mutate(snowt1_z = mean(snowt1_z), 
         male_c = recode(male_c, "-0.452" = "0.548"),
         donor = recode(donor, "70567" = "70459", "72996" = "70459"))

d1_elevz %>% distinct(snowt1_z, male_c, donor) # ensure variables were set to correct values
  
d1_elevz %>% 
  select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
  add_epred_draws(m1d_reduce, ndraws = 100, re_formula = NA) %>% 
  ggplot(aes(x = elev_z, y = surv)) +
    geom_line(aes(y = .epred, group = .draw), color = "blue", size = 0.2, alpha = 0.5) +
#    geom_point(aes(y = .epred)) +
  labs(x = "Elevation of recipient site (standardized)", y = "Probability of 1-year frog survival")


# d1 %>%  # Test of add_predicted_draws used for similar purpose - unsuccessful
#   select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
#   add_predicted_draws(m1d_reduce, ndraws = 100, re_formula = NA, newdata = d1_elevz) %>% View()
#   ggplot(aes(x = elev_z, y = surv)) +
#     geom_line(aes(y = .epred, group = .draw), color = "blue", size = 0.2, alpha = 0.5) +
#     geom_point(aes(y = .epred)) +
#   labs(x = "Elevation of recipient site (standardized)", y = "Probability of 1-year frog survival")
```

#### Effect of winter severity in year t1
```{r}
d1_snowt1z <- d1 %>% 
  mutate(elev_z = mean(elev_z), 
         male_c = recode(male_c, "-0.452" = "0.548"),
         donor = recode(donor, "70567" = "70459", "72996" = "70459"))

d1_snowt1z %>% distinct(elev_z, male_c, donor) # ensure variables were set to correct values
  
d1_snowt1z %>% 
  select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
  add_epred_draws(m1d_reduce, ndraws = 100, re_formula = NA) %>% 
  ggplot(aes(x = snowt1_z, y = surv)) +
    geom_line(aes(y = .epred, group = .draw), color = "blue", size = 0.2, alpha = 0.5) +
#    geom_point(aes(y = .epred)) +
  labs(x = "Winter severity in year following translocation (standardized)", y = "Probability of 1-year frog survival") 
```

#### Effect of sex
```{r}
d1_malec <- d1 %>% 
  mutate(elev_z = mean(elev_z), 
         snowt1_z = mean(snowt1_z),
         donor = recode(donor, "70567" = "70459", "72996" = "70459"))

d1_malec %>% distinct(elev_z, snowt1_z, donor) # ensure variables were set to correct values
  
d1_malec %>% 
  select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
  add_epred_draws(m1d_reduce, ndraws = 100, re_formula = NA) %>% 
  ggplot(aes(x = male_c, y = surv)) +
    geom_boxplot(aes(y = .epred, x = male_c)) +
#    geom_point(aes(y = .epred)) +
    labs(x = "Sex (centered)", y = "Probability of 1-year frog survival")
```

#### Effect of donor
```{r}
d1_donor <- d1 %>% 
  mutate(elev_z = mean(elev_z), 
         snowt1_z = mean(snowt1_z),
         male_c = recode(male_c, "-0.452" = "0.548"))

d1_donor %>% distinct(elev_z, snowt1_z, male_c) # ensure variables were set to correct values
  
d1_donor %>% 
  select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
  add_epred_draws(m1d_reduce, ndraws = 100, re_formula = NA) %>% 
  ggplot(aes(x = donor, y = surv)) +
    geom_boxplot(aes(y = .epred, x = donor)) +
#    geom_point(aes(y = .epred)) +
    labs(x = "Donor population", y = "Probability of 1-year frog survival")
```

#### Joint effects of elevation and sex
```{r}
d1_elevz_malec <- d1 %>% 
  mutate(snowt1_z = mean(snowt1_z), 
         donor = recode(donor, "70567" = "70459", "72996" = "70459"))

d1_elevz_malec %>% distinct(snowt1_z, donor) # ensure variables were set to correct values
  
pal <- c("#0d0887", "#f89540")
d1_elevz_malec %>% 
  select(site_id, translocation_id, snowt1_z, elev_z, male_c, donor, surv) %>% 
  add_epred_draws(m1d_reduce, ndraws = 100, re_formula = NA) %>% 
  ggplot(aes(x = elev_z, y = surv, color = male_c)) +
    geom_line(aes(y = .epred, group = paste(male_c, .draw)), size = 0.4, alpha = 0.5) +
#    geom_point(aes(y = .epred)) +
    labs(x = "Elevation of recipient site (standardized)", y = "Probability of 1-year frog survival") +
    scale_color_manual(values = pal)  + 
    theme_classic()
```


## For comparison, fit model using brms and conditional_effects function to plot variable-specific effects
### Fit model
```{r}
m1d_reduce_brms <- brm(
  surv ~ snowt1_z + elev_z + male_c + donor + (1 | translocation_id),
  data = d1,
  family = bernoulli(),
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
```

### Plot conditional effects
```{r}
plot(conditional_effects(m1d_reduce_brms, spaghetti = TRUE, mean = TRUE, ndraws = 100), ask = FALSE)
```


# Model 2: Survival of frogs from first translocation as predictor of survival in subsequent translocations
## Load packages

```{r}
library(tidyverse)
library(brms)
library(rstanarm)
library(bayesplot)
library(tidybayes)
library(loo)
library(broom.mixed)
library(janitor)

source("classification_summary.R")
```

## Create frog translocation dataset for analysis  
* Read in data, create variables for analysis, standardize continuous variables. 
* arm package used to rescale continuous variables (substract mean, divide by 2 SD), manual centering of binary variables.
```{r}
d2 <- read_csv(here::here("data", "clean", "frog_translocation_final.csv")) %>% 
  select(site_id, year, pit_tag_ref, survival) %>% 
  mutate(
    surv = if_else(survival < 0.5, 0, 1), 
    across(c(site_id, year), as.factor),
    surv = as.integer(surv))

# Add a descriptive translocation_id
d2 <- d2 %>% 
  distinct(site_id, year) %>% 
  arrange(site_id, year) %>% 
  group_by(site_id) %>% 
  mutate(transno = seq_len(n())) %>% 
  ungroup(site_id) %>% 
  unite(translocation_id, c("site_id", "transno"), remove = FALSE) %>% 
  mutate(translocation_id = as.factor(translocation_id)) %>% 
  inner_join(d2, by = c("site_id", "year")) %>% 
  relocate(translocation_id, .after = site_id) %>% 
  relocate(transno, .after = translocation_id)

```

## Restructure dataset
```{r}
# Remove sites that received only 1 translocation
d2 <- d2 %>% 
  distinct(site_id, transno) %>% 
  group_by(site_id) %>% 
  mutate(trans_max = max(transno)) %>% 
  ungroup(site_id) %>% 
  select(-transno) %>% 
  filter(trans_max > 1) %>% 
  distinct(site_id, trans_max) %>% 
  inner_join(d2, by = c("site_id"))

# Create predictor from survival of first translocated cohort
d2 <- d2 %>% 
  filter(transno == 1) %>% 
  group_by(site_id) %>% 
  mutate(survival_trans1 = mean(survival)) %>%    # use mean or median?
  ungroup(site_id) %>% 
  distinct(site_id, survival_trans1) %>% 
  mutate(across(c(survival_trans1), round, 3)) %>% 
  inner_join(d2, by = c("site_id")) %>% 
  filter(transno != 1) %>% 
  relocate(translocation_id, .after = site_id) %>% 
  mutate(survival_trans1_z = arm::rescale(survival_trans1), .after = survival_trans1) %>% 
  select(-transno, -trans_max) 
```

## Model with non-standardized predictor, no grouping variable
### Model specification
```{r}
m2a <- stan_glm(
  surv ~ survival_trans1,
  data = d2,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
```

### List model variable names
```{r}
get_variables(m2a)
```

### Priors & chain diagnostics
```{r}
prior_summary(m2a)

b2 <- c("(Intercept)", "survival_trans1")

mcmc_trace(m2a, size = 0.1, pars = b2)
mcmc_dens_overlay(m2a, pars = b2)

summary(m2a)
```

### Posterior predictive check
```{r}
pp_check(m2a, plotfun = "stat") +
    xlab("Frog survival rate")
```

### Calculate loo
```{r}
loo_m2a <- loo(m2a, cores = 4, save_psis = TRUE)
print(loo_m2a)
```

### Posterior analysis
```{r}
tidy(m2a, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m2a, area_method = "equal height", prob = 0.90, pars = b2) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with unstandardized predictor & no grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")

mean(bayes_R2(m2a))
```

## Model with standardized predictor, no grouping variable
### Model specification
```{r}
m2b <- stan_glm(
  surv ~ survival_trans1_z,
  data = d2,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
```

### List model variable names
```{r}
get_variables(m2b)
```

### Priors & chain diagnostics
```{r}
prior_summary(m2b)

b3 <- c("(Intercept)", "survival_trans1_z")

mcmc_trace(m2b, size = 0.1, pars = b3)
mcmc_dens_overlay(m2b, pars = b3)

summary(m2b)
```

### Posterior predictive check
```{r}
pp_check(m2b, plotfun = "stat") +
    xlab("Frog survival rate")
```

### Calculate loo
```{r}
loo_m2b <- loo(m2b, cores = 4, save_psis = TRUE)
print(loo_m2b)
```

### Posterior analysis
```{r}
tidy(m2b, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m2b, area_method = "equal height", prob = 0.90, pars = b3) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictor & no grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")

mean(bayes_R2(m2b))
```

## Model with standardized predictor, translocation_id grouping variable
### Model specification
```{r}
m2c <- stan_glmer(
  surv ~ survival_trans1_z + (1 | translocation_id),
  data = d2,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
```

### Priors & chain diagnostics
```{r}
prior_summary(m2c)

b3 <- c("(Intercept)", "survival_trans1_z")

mcmc_trace(m2c, size = 0.1, pars = b3)
mcmc_dens_overlay(m2c, pars = b3)

summary(m2c)
```

### Posterior predictive check
```{r}
pp_check(m2c, plotfun = "stat") +
    xlab("Frog survival rate")
```

### Calculate loo
```{r}
loo_m2c <- loo(m2c, cores = 4, save_psis = TRUE)
print(loo_m2c)
```

### Posterior analysis
```{r}
tidy(m2c, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m2c, area_method = "equal height", prob = 0.90, pars = b3) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictor & translocation_id as grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")

mean(bayes_R2(m2c))

# Frog survival by translocation_id
mcmc_areas_ridges(m2c, regex_pars = "translocation_id") +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each translocation_id")
```

## Model with standardized predictor, site_id grouping variable
### Model specification
```{r}
m2d <- stan_glmer(
  surv ~ survival_trans1_z + (1 | site_id),
  data = d2,
  family = binomial,
  chains = 4, 
  iter = 5000*2, 
  cores = 4,
  seed = 84735)
```

### Priors & chain diagnostics
```{r}
prior_summary(m2d)

b3 <- c("(Intercept)", "survival_trans1_z")

mcmc_trace(m2d, size = 0.1, pars = b3)
mcmc_dens_overlay(m2d, pars = b3)

summary(m2d)
```

### Posterior predictive check
```{r}
pp_check(m2d, plotfun = "stat") +
    xlab("Frog survival rate")
```

### Calculate loo
```{r}
loo_m2d <- loo(m2d, cores = 4, save_psis = TRUE)
print(loo_m2d)
```

### Posterior analysis
```{r}
tidy(m2d, conf.int = TRUE, conf.level = 0.90)

mcmc_areas(m2d, area_method = "equal height", prob = 0.90, pars = b3) +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Model with standardized predictor & site_id as grouping variable",
          "Posterior distributions with medians & 90% uncertainty intervals")

mean(bayes_R2(m2d))

# Frog survival by site_id
mcmc_areas_ridges(m2d, regex_pars = "site_id") +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", lwd = 0.5) +  
  ggtitle("Frog survival for each site_id")
```

### Compare 3 models with different grouping structures
```{r}
loo_compare(loo_m2b, loo_m2c, loo_m2d)
```
- Based on expected log-predictive densities (ELPD), models with and without grouping variables have similar posterior predictive accuracy (elpd_diff of m2b is less than 2 SE lower than elpd_diff of m2d: elpd_diff = -3.1 $\pm$ 5.8, 2 SE interval overlaps 0: 2.7 -- -8.9).  
- Conclusion from all 3 models is the same: Average frog survival of first translocation is an important predictor of survival of individual frogs in subsequent translocations.  

## Posterior classification accuracy of best models
```{r}
# Generate confusion matrix for m2b
set.seed(84735)
classification_summary(m2b, data = d2, cutoff = 0.5)

# Generate confusion matrix for m2d
set.seed(84735)
classification_summary(m2d, data = d2, cutoff = 0.5)
```
- As expected from loo results, models with and without grouping variable have the same posterior predictive accuracy. 


## TO DO
- Validate priors, tune as necessary. Add final priors to model specification (in case the defaults change).  
- Check approach to plotting conditional effects.  
- Try un-standardizing predictor values in conditional-effects plots (and other plots?) to produce more intuitive visualizations.  


